home *** CD-ROM | disk | FTP | other *** search
-
-
-
- BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
-
-
-
- NNNNAAAAMMMMEEEE
- BLAS - Basic Linear Algebra Subprograms
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- BLAS is a library of routines that perform basic operations involving
- matrices and vectors. They were designed as a way of achieving efficiency
- in the solution of linear algebra problems. The BLAS, as they are now
- commonly called, have been very successful and have been used in a wide
- range of software, including LINPACK, LAPACK and many of the algorithms
- published by the ACM Transactions on Mathematical Software. They are an
- aid to clarity, portability, modularity and maintenance of software, and
- have become the de facto standard for elementary vector and matrix
- operations.
-
- The BLAS promote modularity by identifying frequently occurring
- operations of linear algebra and by specifying a standard interface to
- these operations. Efficiency is achieved through optimization within the
- BLAS without altering the higher-level code that has referenced them.
-
- There are three levels of BLAS. The original set of BLAS, commonly
- referred as the Level 1 BLAS, perform low-level operations such as dot-
- product and the adding of a multiple of one vector to another. Typically
- these operations involve O(N) floating point operations and O(N) data
- items moved (loaded or stored), where N is the length of the vectors. The
- Level 1 BLAS permit efficient implementation on scalar machines, but the
- ratio of floating-point operations to data movement is too low to achieve
- effective use of most vector or parallel hardware.
-
- The Level 2 BLAS perform Matrix-Vector operations that occur frequently
- in the implementation of mant of the most common linear algebra
- algorithms. They involve O(N^2) floating point operations. Algorithms
- that use Level 2 BLAS can be very efficient on vector computers, but are
- not well suited to computers with a hierarchy of memory (such as cache
- memory).
-
- The Level 3 BLAS are targeted at matrix-matrix operations. These
- operations generally involve O(N^3) floating point operations, while only
- creating O(N^2) data movement. These operations permit efficient reuse of
- data that resides in cache and create what is often called the surface-
- to-volumne effect for the ratio of computations to data movement. In
- addition, matrices can be partitioned into blocks, and operations on
- distinct blocks can be performed in parallel, and within the operations
- on each block, scalar or vector operations may be performed in parallel.
-
- BLAS2 and BLAS3 modules have been optimized and parallelized to take
- advantage of SGI's RISC parallel architecture. The best performances are
- achieved for BLAS3 routines (e.g. DGEMM), where "outer-loop" unrolling +
- "blocking" techniques were applied to take advantage of the memory cache.
- The performance of BLAS2 routines (e.g. DGEMV) is sensitive to the size
- of the problem, for large sizes the high rate of cache miss slows down
- the algorithms.
- LAPACK algorithms use preferably BLAS3 modules and are the most
-
-
-
- PPPPaaaaggggeeee 1111
-
-
-
-
-
-
- BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
-
-
-
- efficient. LINPACK uses only BLAS1 modules and therefore is less
- efficient than LAPACK.
-
- To link with "libblas", it is advised to use "f77" to load all the
- Fortran Libraries required, otherwise include -lftn in your link line.
- For R8000 and R10000 based machines, you should use the mips4 version.
- This is accomplished by using ----mmmmiiiippppssss4444 when linking:
- ffff77777777 ----mmmmiiiippppssss4444 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss
- To use the parallelized version, use
- ffff77777777 ----mmmmiiiippppssss4444 ----mmmmpppp -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss____mmmmpppp
-
- SSSSUUUUMMMMMMMMAAAARRRRYYYY
- BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 1111::::
- .....function...... ....prefix,suffix..... rootname
- dot product s- d- c-u c-c z-u z-c -dot-
- y = a*x + y s- d- c- z- -axpy
- setup Givens rotation s- d- -rotg
- apply Givens rotation s- d- cs- zd- -rot
- copy x into y s- d- c- z- -copy
- swap x and y s- d- c- z- -swap
- Euclidean norm s- d- sc- dz- -nrm2
- sum of absolute values s- d- sc- dz- -asum
- x = a*x s- d- cs- c- zd- z- -scal
- index of max abs value is- id- ic- iz- -amax
-
-
- BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 2222::::
- MV Matrix vector multiply
- R Rank one update to a matrix
- R2 Rank two update to a matrix
- SV Solving certain triangular matrix problems.
-
- single precision Level 2 BLAS | Double precision Level 2 BLAS
- -----------------------------------------------------------------------
- MV R R2 SV | MV R R2 SV
- SGE x x | DGE x x
- SGB x | DGB x
- SSP x x x | DSP x x x
- SSY x x x | DSY x x x
- SSB x | DSB x
- STR x x | DTR x x
- STB x x | DTB x x
- STP x x | DTP x x
-
- complex Level 2 BLAS | Double precision complex Level 2 BLAS
- -----------------------------------------------------------------------
- MV R RC RU R2 SV| MV R RC RU R2 SV
- CGE x x x | ZGE x x x
- CGB x | ZGB x
- CHE x x x | ZHE x x x
- CHP x x x | ZHP x x x
- CHB x | ZHB x
-
-
-
- PPPPaaaaggggeeee 2222
-
-
-
-
-
-
- BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
-
-
-
- CTR x x | ZTR x x
- CTB x x | ZTB x x
- CTP x x | ZTP x x
-
- BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 3333::::
- MM Matrix matrix multiply
- RK Rank-k update to a matrix
- R2K Rank-2k update to a matrix
- SM Solving triangular matrix with many right-hand-sides.
-
- single precision Level 3 BLAS | Double precision Level 3 BLAS
- -----------------------------------------------------------------------
- MM RK R2K SM | MM RK R2K SM
- SGE x | DGE x
- SSY x x x | DSY x x x
- STR x x | DTR x x
-
- complex Level 3 BLAS | Double precision complex Level 3 BLAS
- -----------------------------------------------------------------------
- MM RK R2K SM | MM RK R2K SM
- CGE x | ZGE x
- CSY x x x | ZSY x x x
- CHE x x x | ZHE x x x
- CTR x x | ZTR x x
-
- CCCC IIIINNNNTTTTEEEERRRRFFFFAAAACCCCEEEE
- There is a C interface for the BLAS library. The implementation is based
- on the proposed specification for BLAS routines in C [1].
-
- The argument lists follow closely the equivalent Fortran ones. The main
- changes being that enumeration types are used instead of character types
- for option specification, and two dimensional arrays are stored in one
- dimensional C arrays in an analogous fashion as a Fortran array (column
- major). Therefore, a matrix A would be stored as:
-
- double (*a)[lda*n];
- /* */
- /* aaaa is a pointer to an array of size ttttddddaaaa****nnnn */
- /* */
-
- where element A(i+1,j) of matrix A is stored immediately after the
- element A(i,j), while A(i,j+1) is lda elements apart from A(i,j). The
- element A(i,j) of the matrix can be accessed directly by reference to a[
- (j-1)*lda + (i-1) ].
-
- The names of the C versions of the BLAS are the same as the Fortran
- versions since the compiler puts the Fortran names in upper case and adds
- an underscore after the name.
-
- The argument lists use the following data types:
-
- Integer: an integer data type of 32 bits.
-
-
-
- PPPPaaaaggggeeee 3333
-
-
-
-
-
-
- BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
-
-
-
- float: the regular single precision floating-point type.
- double: the regular double precision floating-point type.
- Complex: a single precision complex type.
- Zomplex: a double precision complex type.
-
- plus the enumeration types given by
-
- typedef enum { NoTranspose, Transpose, ConjugateTranspose }
- MatrixTranspose;
-
- typedef enum { UpperTriangle, LowerTriangle }
- MatrixTriangle;
-
- typedef enum { UnitTriangular, NotUnitTriangular }
- MatrixUnitTriangular;
-
- typedef enum { LeftSide, RightSide }
- OperationSide;
-
- The complex data types are stored in cartesian form, i.e., as real and
- imaginary parts. For example
-
- typedef struct { float real;
- float imag;
- } Complex;
-
- typedef struct { double real;
- double imag;
- } Zomplex;
-
- The operations performed by the C BLAS are identical to those performed
- by the corresponding Fortran BLAS, as specified in [2], [3] and [4].
-
- To use the C BLAS, link with "libblas". It is advised to use "f77" to
- load all the Fortran Libraries required:
- ffff77777777 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss
-
- FFFFIIIILLLLEEEESSSS
- /usr/lib/libblas.a
- /usr/lib/libblas_mp.a
- /usr/include/cblas.h
-
- OOOORRRRIIIIGGGGIIIINNNN
- The original Fortran source code comes from netlib.
-
- RRRREEEEFFFFEEEERRRREEEENNNNCCCCEEEESSSS
- S.P. Datardina, J.J. Du Croz, S.J. Hammarling and M.W. Pont, "A Proposed
- Specification of BLAS Routines in C", NAG Technical Report TR6/90.
-
- C Lawson, R. Hanson, D. Kincaid, and F. Krough, "Basic Linear Algebra
- Subprograms for Fortran usage ", ACM Trans. on Math. Soft. 5(1979)
- 308-325
-
-
-
- PPPPaaaaggggeeee 4444
-
-
-
-
-
-
- BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF))))
-
-
-
- J.Dongarra, J.DuCroz, S.Hammarling, and R.Hanson, "An extended set of
- Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft. 14,
- 1(1988) 1-32
-
- J.Dongarra, J.DuCroz, I.Duff,and S.Hammarling, "An set of level 3 Basic
- Algebra Subprograms", ACM Trans on Math Soft( Dec 1989)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- PPPPaaaaggggeeee 5555
-
-
-
-